The relationship between HRR-based similarity and similarity based on structural kernels

نویسنده

  • Tony A. Plate
چکیده

Work in machine learning on kernel-based methods over discrete structures, such as strings and trees, uses a variety of kernels to measure similarity between structures (Haussler, 1999, Collins and Duffy, 2002, Bod, 1998). For example, a kernel for strings could count the number of matching substrings, and kernel for trees could count the number of matching subtrees. A kernel is always a dot product between two feature vectors, i.e., a function K where K(x, y) = h(x) · h(y) = ∑ i hi(x)hi(y). The function h is some mapping of structures onto numerical vectors – hi(x) is the value of the ith feature of structure x. For example, in a kernel for strings, hi(x) could be the number of times the ith substring (in some enumeration of all substrings in the data set of interest) occurs in string x. What makes kernel-based methods practical is that there are efficient methods for computing K(x, y) that do not require h(x) (or h(y)) to ever be explicitly represented or even enumerated. This is fortunate, as the length of h-vectors can be exponential in the size of the structures being represented – even the number of non-zero elements in h(x) can be exponential in the size of x. These efficient algorithms are typically based on dynamicprogramming techniques and calculate K(x, y) in time polynomial in the size of x and y. For some kernels there are even algorithms with quadratic time and linear time complexity (e.g., Vishwanathan and Smola, 2003, Lodhi et al., 2000). Kernel-based machine learning algorithms have shown some degree of success in challenging tasks, e.g., text classification (Lodhi et al., 2000), and natural language parsing (Collins and Duffy, 2002). Hence it is interesting to see how

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A novel method for detecting structural damage based on data-driven and similarity-based techniques under environmental and operational changes

The applications of time series modeling and statistical similarity methods to structural health monitoring (SHM) provide promising and capable approaches to structural damage detection. The main aim of this article is to propose an efficient univariate similarity method named as Kullback similarity (KS) for identifying the location of damage and estimating the level of damage severity. An impr...

متن کامل

Providing a Link Prediction Model based on Structural and Homophily Similarity in Social Networks

In recent years, with the growing number of online social networks, these networks have become one of the best markets for advertising and commerce, so studying these networks is very important. Most online social networks are growing and changing with new communications (new edges). Forecasting new edges in online social networks can give us a better understanding of the growth of these networ...

متن کامل

HESITANT FUZZY INFORMATION MEASURES DERIVED FROM T-NORMS AND S-NORMS

In this contribution, we first introduce the concept of metrical T-norm-based similarity measure for hesitant fuzzy sets (HFSs) {by using the concept of T-norm-based distance measure}. Then,the relationship of the proposed {metrical T-norm-based} similarity {measures} with the {other kind of information measure, called the metrical T-norm-based} entropy measure {is} discussed. The main feature ...

متن کامل

A Novel Image Structural Similarity Index Considering Image Content Detectability Using Maximally Stable Extremal Region Descriptor

The image content detectability and image structure preservation are closely related concepts with undeniable role in image quality assessment. However, the most attention of image quality studies has been paid to image structure evaluation, few of them focused on image content detectability. Examining the image structure was firstly introduced and assessed in Structural SIMilarity (SSIM) measu...

متن کامل

Machine Cell Formation Based on a New Similarity Coefficient

One of the designs of cellular manufacturing systems (CMS) requires that a machine population be partitioned into machine cells. Numerous methods are available for clustering machines into machine cells. One method involves using a similarity coefficient. Similarity coefficients between machines are not absolute, and they still need more attention from researchers. Although there are a number o...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003